Online Policy Improvement in Large POMDPs via an Error Minimization Search

نویسندگان

Stéphane Ross

Joelle Pineau

چکیده

Partially Observable Markov Decision Processes (POMDPs) provide a rich mathematical framework for planning under uncertainty. However, most real world systems are modelled by huge POMDPs that cannot be solved due to their high complexity. To palliate to this difficulty, we propose combining existing offline approaches with an online search process, called AEMS, that can improve locally an approximate policy computed offline, by reducing its error and providing better performance guarantees. We propose different heuristics to guide this search process, and provide theoretical guarantees on the convergence to ǫ-optimal solutions. Our experimental results show that our approach can provide better solution quality within a smaller overall time than state-of-the-art algorithms and allow for interesting online/offline computation tradeoff.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

AEMS: An Anytime Online Search Algorithm for Approximate Policy Refinement in Large POMDPs

Solving large Partially Observable Markov Decision Processes (POMDPs) is a complex task which is often intractable. A lot of effort has been made to develop approximate offline algorithms to solve ever larger POMDPs. However, even stateof-the-art approaches fail to solve large POMDPs in reasonable time. Recent developments in online POMDP search suggest that combining offline computations with ...

متن کامل

Online Planning Algorithms for POMDPs

Partially Observable Markov Decision Processes (POMDPs) provide a rich framework for sequential decision-making under uncertainty in stochastic domains. However, solving a POMDP is often intractable except for small problems due to their complexity. Here, we focus on online approaches that alleviate the computational complexity by computing good local policies at each decision step during the e...

متن کامل

Theoretical Analysis of Heuristic Search Methods for Online POMDPs

Planning in partially observable environments remains a challenging problem, despite significant recent advances in offline approximation techniques. A few online methods have also been proposed recently, and proven to be remarkably scalable, but without the theoretical guarantees of their offline counterparts. Thus it seems natural to try to unify offline and online techniques, preserving the ...

متن کامل

Incremental Policy Iteration with Guaranteed Escape from Local Optima in POMDP Planning

Partially observable Markov decision processes (POMDPs) provide a natural framework to design applications that continuously make decisions based on noisy sensor measurements. The recent proliferation of smart phones and other wearable devices leads to new applications where, unfortunately, energy efficiency becomes an issue. To circumvent energy requirements, finite-state controllers can be ap...

متن کامل

FHHOP: A Factored Hybrid Heuristic Online Planning Algorithm for Large POMDPs

Planning in partially observable Markov decision processes (POMDPs) remains a challenging topic in the artificial intelligence community, in spite of recent impressive progress in approximation techniques. Previous research has indicated that online planning approaches are promising in handling large-scale POMDP domains efficiently as they make decisions “on demand” instead of proactively for t...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2007

Online Policy Improvement in Large POMDPs via an Error Minimization Search

نویسندگان

چکیده

منابع مشابه

AEMS: An Anytime Online Search Algorithm for Approximate Policy Refinement in Large POMDPs

Online Planning Algorithms for POMDPs

Theoretical Analysis of Heuristic Search Methods for Online POMDPs

Incremental Policy Iteration with Guaranteed Escape from Local Optima in POMDP Planning

FHHOP: A Factored Hybrid Heuristic Online Planning Algorithm for Large POMDPs

عنوان ژورنال:

اشتراک گذاری